432 Class 22

Thomas E. Love, Ph.D.

2024-04-04

Today’s Topic

Cox models for time-to-event data

  • Returning to the breast cancer trial
  • Using cph from rms to fit a Cox model

This material is discussed in Chapters 29-31 of our Course Notes

Replicable Research and the Crisis in Science

  • Some reminders from the ASA’s 2019 Statement on Statistical Inference in the 21st Century

Setup

knitr::opts_chunk$set(comment=NA)
options(width = 80)

library(janitor)
library(broom)
library(gt)
library(rms)
library(survival)
library(survminer)
library(tidyverse)

theme_set(theme_bw())

Our breast cancer data

brca <- read_csv("c22/data/brca.csv", show_col_types = FALSE) |> 
  mutate(across(where(is_character), as_factor),
         subject = as.character(subject))

head(brca)
# A tibble: 6 × 5
  subject treat  trial_weeks last_alive   age
  <chr>   <fct>        <dbl>      <dbl> <dbl>
1 A01     S_CT           102          0    55
2 A02     S_IT           192          0    62
3 A03     S_Both          73          0    72
4 A04     S_CT            58          1    48
5 A05     S_CT            48          1    26
6 A06     S_IT           182          1    52

Recap of Class 21

Data from a trial of three treatments for breast cancer

  • brca tibble with treat = S_CT, S_IT, S_Both and age at baseline
  • Time to event data are gathered in trial_weeks and last_alive which we used to create a survival object S.
  • Created Kaplan-Meier estimate, kmfit to compare the treat results
  • Then built a Cox model for treatment, called mod_T.

What Will We Do Now?

  • incorporate the covariate (age) into the model
  • use cph from the rms package to fit a Cox model that incorporates some non-linearity

Create survival object

  • trial_weeks: time in the study, in weeks, to death or censoring
  • last_alive: 1 if alive at last follow-up (and thus censored), 0 if dead

So last_alive = 0 if the event (death) occurs.

brca$S <- with(brca, Surv(trial_weeks, last_alive == 0))

head(brca$S)
[1] 102  192   73   58+  48+ 182+

Fit Cox Model mod_T: Treatment alone

mod_T <- coxph(S ~ treat, data = brca)
mod_T
Call:
coxph(formula = S ~ treat, data = brca)

               coef exp(coef) se(coef)      z     p
treatS_IT   -0.5832    0.5581   0.6088 -0.958 0.338
treatS_Both -0.8313    0.4355   0.6547 -1.270 0.204

Likelihood ratio test=1.75  on 2 df, p=0.4164
n= 31, number of events= 15 

Fit Cox Model mod_AT: Age + Treatment

mod_AT <- coxph(S ~ age + treat, data = brca)
mod_AT
Call:
coxph(formula = S ~ age + treat, data = brca)

                coef exp(coef) se(coef)      z      p
age          0.07807   1.08119  0.03672  2.126 0.0335
treatS_IT   -0.31161   0.73227  0.60936 -0.511 0.6091
treatS_Both -0.59960   0.54903  0.65741 -0.912 0.3617

Likelihood ratio test=6.99  on 3 df, p=0.07224
n= 31, number of events= 15 

Coefficients of mod_AT

tidy(mod_AT, exponentiate = TRUE, conf.int = TRUE) |>
  select(term, estimate, std.error, conf.low, conf.high) |>
  gt() |> fmt_number(decimals = 3) |> tab_options(table.font.size = 20)
term estimate std.error conf.low conf.high
age 1.081 0.037 1.006 1.162
treatS_IT 0.732 0.609 0.222 2.417
treatS_Both 0.549 0.657 0.151 1.992
  • If Harry and Sally receive the same treat but Harry is one year older, the model estimates Harry will have 1.08 times the hazard of Sally (95% CI 1.01, 1.16).

Coefficients of mod_AT

term estimate std.error conf.low conf.high
age 1.081 0.037 1.006 1.162
treatS_IT 0.732 0.609 0.222 2.417
treatS_Both 0.549 0.657 0.151 1.992
  • If Cyrus receives S_IT and Sally receives S_CT, and they are the same age, the model estimates Cyrus will have 0.73 times the hazard of Sally (95% CI 0.22, 2.41).
  • If Barry receives S_Both and Sally receives S_CT, and they are the same age, the model estimates Barry will have 0.55 times the hazard of Sally (95% CI 0.15, 1.99).

Comparing the Two Models

n = 31, nevent = 15 for each model.

bind_rows(glance(mod_T), glance(mod_AT)) |>
    mutate(model = c("mod_T", "mod_AT")) |>
    select(model, p.value.log, concordance, r.squared, 
           max_r2 = r.squared.max, AIC, BIC) |> 
  gt() |> fmt_number(decimals = 3) |> tab_options(table.font.size = 20)
model p.value.log concordance r.squared max_r2 AIC BIC
mod_T 0.416 0.577 0.055 0.944 91.773 93.189
mod_AT 0.072 0.701 0.202 0.944 88.536 90.660

What do the glance results indicate?

Likelihood Ratio ANOVA

Comparing the mod_AT model with age and treatment to the mod_T model with treatment alone…

anova(mod_AT, mod_T)
Analysis of Deviance Table
 Cox model: response is  S
 Model 1: ~ age + treat
 Model 2: ~ treat
   loglik Chisq Df Pr(>|Chi|)  
1 -41.268                      
2 -43.886 5.237  1    0.02211 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

What does this suggest? Does this match with what AIC and BIC suggested?

Graphical PH Check for mod_AT

ggcoxzph(cox.zph(mod_AT))

Using cph from the rms package

Using rms::cph to fit a fancier AxT

brca <- read_csv("c22/data/brca.csv", show_col_types = FALSE) |> 
  mutate(across(where(is_character), as_factor),
         subject = as.character(subject)) # reload without S

d <- datadist(brca)
options(datadist="d")

brca$S <- with(brca, Surv(trial_weeks, last_alive == 0))

cph_AxT <- cph(S ~ rcs(age, 4) + treat + age %ia% treat, 
               data = brca, 
               x = TRUE, y = TRUE, surv = TRUE)

cph_AxT results

cph_AxT
Cox Proportional Hazards Model

cph(formula = S ~ rcs(age, 4) + treat + age %ia% treat, data = brca, 
    x = TRUE, y = TRUE, surv = TRUE)

                        Model Tests    Discrimination    
                                              Indexes    
Obs         31    LR chi2     11.66    R2       0.332    
Events      15    d.f.            7    R2(7,31) 0.140    
Center 19.2233    Pr(> chi2) 0.1123    R2(7,15) 0.267    
                  Score chi2  11.89    Dxy      0.488    
                  Pr(> chi2) 0.1042                      

                   Coef    S.E.   Wald Z Pr(>|Z|)
age                 0.4016 0.2610  1.54  0.1239  
age'               -1.2521 0.7528 -1.66  0.0963  
age''               2.7316 1.5490  1.76  0.0778  
treat=S_IT          5.0537 6.1625  0.82  0.4122  
treat=S_Both        4.9327 6.6650  0.74  0.4592  
age * treat=S_IT   -0.1011 0.1072 -0.94  0.3455  
age * treat=S_Both -0.1006 0.1157 -0.87  0.3846  

Effects Plot

plot(summary(cph_AxT))

Effects Summary

summary(cph_AxT)
             Effects              Response : S 

 Factor              Low  High Diff. Effect   S.E.    Lower 0.95 Upper 0.95
 age                 49.5 61.5 12     0.48200 0.99998 -1.47790    2.4419   
  Hazard Ratio       49.5 61.5 12     1.61930      NA  0.22811   11.4950   
 treat - S_IT:S_CT    1.0  2.0 NA    -0.40504 0.78888 -1.95120    1.1411   
  Hazard Ratio        1.0  2.0 NA     0.66695      NA  0.14210    3.1303   
 treat - S_Both:S_CT  1.0  3.0 NA    -0.49745 0.80805 -2.08120    1.0863   
  Hazard Ratio        1.0  3.0 NA     0.60808      NA  0.12478    2.9633   

Adjusted to: age=54 treat=S_CT  

Validation of model summaries

set.seed(4321234)
validate(cph_AxT)

Divergence or singularity in 12 samples
      index.orig training    test optimism index.corrected  n
Dxy       0.4883   0.6194  0.3722   0.2472          0.2411 28
R2        0.3320   0.5047  0.1632   0.3415         -0.0096 28
Slope     1.0000   1.0000  0.3040   0.6960          0.3040 28
D         0.1191   0.2346  0.0482   0.1864         -0.0673 28
U        -0.0223  -0.0220  1.7430  -1.7650          1.7427 28
Q         0.1414   0.2566 -1.6949   1.9514         -1.8100 28
g         1.9803   9.9533  1.0352   8.9181         -6.9377 28

ANOVA for cph_AxT model

anova(cph_AxT)
                Wald Statistics          Response: S 

 Factor                                     Chi-Square d.f. P     
 age  (Factor+Higher Order Factors)         7.71       5    0.1727
  All Interactions                          0.96       2    0.6175
  Nonlinear                                 3.73       2    0.1548
 treat  (Factor+Higher Order Factors)       2.58       4    0.6297
  All Interactions                          0.96       2    0.6175
 age * treat  (Factor+Higher Order Factors) 0.96       2    0.6175
 TOTAL NONLINEAR + INTERACTION              3.74       4    0.4423
 TOTAL                                      8.55       7    0.2868

survplot in rms for age comparison

survplot(cph_AxT, age = c(35, 45, 55, 65),
         time.inc = 26, type = "kaplan-meier",
         xlab = "Study Survival Time in weeks")

survplot for treat comparison

survplot(cph_AxT, treat, 
         time.inc = 26, type = "kaplan-meier",
         xlab = "Study Survival Time in weeks")

Plotting age effect in cph_AxT

ggplot(Predict(cph_AxT, age))

Plotting treat effect in cph_AxT

ggplot(Predict(cph_AxT, treat))

cph_AxT nomogram

Suppose I want to show 4-year survival rates at the bottom of the nomogram. 4 years is 208 weeks, which is the unit of time the model works with, so we have…

sv <- Survival(cph_AxT)
surv4 <- function(x) sv(208, lp = x)

plot(nomogram(cph_AxT,
              fun = surv4,
              funlabel = c("4 year survival")))

cph_AxT nomogram

Proportional Hazards Assumption?

cox.zph(cph_AxT, transform = "km", global = TRUE)
               chisq df     p
rcs(age, 4)     1.98  3 0.576
treat           4.38  2 0.112
age %ia% treat  5.71  2 0.058
GLOBAL         10.67  7 0.154

Proportional Hazards Assumption?

ggcoxzph(cox.zph(cph_AxT))

More Cox Model Diagnostic Plots?

  • survminer has a function called ggcoxdiagnostics() which plots different types of residuals as a function of time, linear predictor or observation id.
    • See next slide for the default graph (martingale residuals.)
    • Available diagnostics are specified with the type parameter, with options…
type = c("martingale", "deviance", "score", "schoenfeld", 
        "dfbeta", "dfbetas", "scaledsch", "partial")

Diagnostics from survminer

ggcoxdiagnostics(cph_AxT)

More on Survival Analysis?

Our department teaches an entire course on this subject every Spring (PQHS 435).

Some Reminders from the ASA 2019 Statement on Statistical Inference

Moving to a World Beyond “p < 0.05”

  1. Getting to a Post “p < 0.05” Era
  2. Interpreting and Using p
  3. Supplementing or Replacing p
  4. Adopting more holistic approaches
  5. Reforming Institutions: Changing Publication Policies and Statistical Education
  • Long list of “things to do” in Section 7 of the main editorial.

Statistical Inference in the 21st Century

ATOM: Accept uncertainty. Be Thoughtful, Open and Modest.

  • Statistical methods do not rid data of their uncertainty.

ATOM: Accept uncertainty. Be Thoughtful, Open and Modest.

We can make acceptance of uncertainty more natural to our thinking by accompanying every point estimate in our research with a measure of its uncertainty such as a standard error or interval estimate. Reporting and interpreting point and interval estimates should be routine.

ATOM: Accept uncertainty. Be Thoughtful, Open and Modest.

How will accepting uncertainty change anything? To begin, it will prompt us to seek better measures, more sensitive designs, and larger samples, all of which increase the rigor of research.

It also helps us be modest … [and] leads us to be thoughtful.

ATOM: Accept uncertainty. Be Thoughtful, Open and Modest.

ATOM: Accept uncertainty. Be Thoughtful, Open and Modest.

ATOM: Accept uncertainty. Be Thoughtful, Open and Modest.

ATOM: Accept uncertainty. Be Thoughtful, Open and Modest.

ATOM: Accept uncertainty. Be Thoughtful, Open and Modest.

ATOM: Accept uncertainty. Be Thoughtful, Open and Modest.

ATOM: Accept uncertainty. Be Thoughtful, Open and Modest.

ATOM: Accept uncertainty. Be Thoughtful, Open and Modest.

ATOM: Accept uncertainty. Be Thoughtful, Open and Modest.

ATOM: Accept uncertainty. Be Thoughtful, Open and Modest.

The nexus of openness and modesty is to report everything while at the same time not concluding anything from a single study with unwarranted certainty. Because of the strong desire to inform and be informed, there is a relentless demand to state results with certainty. Again, accept uncertainty and embrace variation in associations and effects, because they are always there, like it or not.

ATOM: Accept uncertainty. Be Thoughtful, Open and Modest.

Understand that expressions of uncertainty are themselves uncertain. Accept that one study is rarely definitive, so encourage, sponsor, conduct, and publish replication studies.

Be modest by encouraging others to reproduce your work. Of course, for it to be reproduced readily, you will necessarily have been thoughtful in conducting the research and open in presenting it.

On “Practical Benefit”

Switch from reliance on statistical or practical significance to the more stringent statistical criterion of practical benefit for:

    1. assessing whether applied research findings indicate that an intervention is effective and should be adopted and scaled—particularly in complex organizations such as schools and hospitals and
    1. determining whether relationships are sufficiently strong and explanatory to be used as a basis for setting policy or practice recommendations.

On “Practical Benefit”

Require that applied research reveal the actual unadjusted means/medians of results for all groups and subgroups, and that review panels take such data into account, as opposed to only reporting relative differences between adjusted means/medians.

So let’s do it!

Ten of the Many Insights from the “Authors’ Suggestions”

  1. Do not use p-values, unless you have clearly thought about the need to use them and they still seem the best choice.
  2. Develop and share teaching materials, software, and published case examples to help with all of the do’s above, and to spread progress in one discipline to others.
  3. Ask quantitative questions and give quantitative answers.

Ten of the Many Insights from the “Authors’ Suggestions”

  1. Understand that subjective judgments are needed in all stages of a study.
  2. Do not dichotomize, but embrace variation. Report and interpret inferential statistics like the p-value in a continuous fashion; do not use the word “significant.” Interpret interval estimates as “compatibility intervals,” showing effect sizes most compatible with the data, under the model used to compute the interval; do not focus on whether such intervals include or exclude zero.

Ten of the Many Insights from the “Authors’ Suggestions”

  1. Evaluate the strength of empirical evidence based on the precision of the estimates and the plausibility of the modeling choices. Seek out subject matter expertise when evaluating the importance and the strength of empirical evidence. Evaluate the importance of statistical results based on their practical implications.

Ten of the Many Insights from the “Authors’ Suggestions”

  1. Be transparent in the number of outcome variables that were analyzed.
  2. Clearly describe data values that were excluded from analysis and the justification for doing so.
  3. Provide sufficient details on experimental design so that other researchers can replicate the experiment.
  4. Formulate a clear objective for variable inclusion in regression procedures.

What’s Next?

Using a tidymodels approach to fit linear models.